Crowd-Sourcing of Human Judgments of Machine Translation Fluency

نویسندگان

Yvette Graham

Timothy Baldwin

Alistair Moffat

Justin Zobel

چکیده

Human evaluation of machine translation quality is a key element in the development of machine translation systems, as automatic metrics are validated through correlation with human judgment. However, achievement of consistent human judgments of machine translation is not easy, with decreasing levels of consistency reported in annual evaluation campaigns. In this paper we describe experiences gained during the collection of human judgments of the fluency of machine translation output using Amazon’s Mechanical Turk service. We gathered a large collection of crowd-sourced human judgments for the machine translation systems that participated in the WMT 2012 shared translation task, collected across a range of eight different assessment configurations to gain insight into possible causes of – and remedies for – inconsistency in human judgments. Overall, approximately half of the workers carry out the human evaluation to a high standard, but effectiveness varies considerably across different target languages, with dramatically higher numbers of good quality judgments for Spanish and French, and the reverse observed for German.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Learning and Crowd-Sourcing for Machine Translation

In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic...

متن کامل

Is Machine Translation Getting Better over Time?

Recent human evaluation of machine translation has focused on relative preference judgments of translation quality, making it difficult to track longitudinal improvements over time. We carry out a large-scale crowd-sourcing experiment to estimate the degree to which state-of-theart performance in machine translation has increased over the past five years. To facilitate longitudinal evaluation, ...

متن کامل

English to Hindi Translation Protocols for an Enterprise Crowd

We present early results on crowd sourcing translations in an enterprise setting. We show that several weak translators can together converge to translation quality higher than individually plausible. We share learning about post editing of translations and the effort perceived by the crowd. A key observation is that a protocol “machine-human-human” that utilizes two-hop post editing can provid...

متن کامل

Crowd-based MT Evaluation for non-English Target Languages

This paper investigates the feasibility of using crowd-sourcing services for the human assessment of machine translation quality of translations into non-English target languages. Non-expert graders are hired through the CrowdFlower interface to Amazon’s Mechanical Turk in order to carry out a ranking-based MT evaluation of utterances taken from the travel conversation domain for 10 Indo-Europe...

متن کامل

Crowd-based Evaluation of English and Japanese Machine Translation Quality

This paper investigates the feasibility of using crowd-sourcing services for the human assessment of machine translation quality of English and Japanese translation tasks. Nonexpert graders are hired in order to carry out a ranking-based MT evaluation of utterances taken from the domain of travel conversations. Besides a thorough analysis of the obtained non-expert grading results, data quality...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Crowd-Sourcing of Human Judgments of Machine Translation Fluency

نویسندگان

چکیده

منابع مشابه

Active Learning and Crowd-Sourcing for Machine Translation

Is Machine Translation Getting Better over Time?

English to Hindi Translation Protocols for an Enterprise Crowd

Crowd-based MT Evaluation for non-English Target Languages

Crowd-based Evaluation of English and Japanese Machine Translation Quality

عنوان ژورنال:

اشتراک گذاری